Skip to content

OCPERT-331: Migrate Jenkins Stage pipeline to Prow jobs#76154

Open
tomasdavidorg wants to merge 6 commits intoopenshift:mainfrom
tomasdavidorg:OCPERT-331
Open

OCPERT-331: Migrate Jenkins Stage pipeline to Prow jobs#76154
tomasdavidorg wants to merge 6 commits intoopenshift:mainfrom
tomasdavidorg:OCPERT-331

Conversation

@tomasdavidorg
Copy link
Contributor

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 12, 2026

@tomasdavidorg: This pull request references OCPERT-331 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

https://issues.redhat.com/browse/OCPERT-331

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 12, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 12, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 12, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@tomasdavidorg
Copy link
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.21-stage-testing-e2e-aws-ipi

@openshift-ci-robot
Copy link
Contributor

@tomasdavidorg: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@tomasdavidorg
Copy link
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.21-stage-testing-e2e-aws-ipi

@openshift-ci-robot
Copy link
Contributor

@tomasdavidorg: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@tomasdavidorg tomasdavidorg marked this pull request as ready for review March 12, 2026 14:34
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 12, 2026
@openshift-ci openshift-ci bot requested review from asood-rh and rioliu-rh March 12, 2026 14:42
Copy link
Contributor

@rioliu-rh rioliu-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build_root field is set incorrectly across all stage-testing configs — it uses a hardcoded name: release / namespace: openshift / tag: rhel-9-release-golang-1.22-openshift-4.17 instead of the release-specific builder image. Please update each config to match the corresponding main release config (see inline comment on 4.12 as an example). The correct values per release are:

Release tag
4.12 rhel-8-golang-1.19-openshift-4.12
4.13 rhel-8-golang-1.19-openshift-4.13
4.14 rhel-8-golang-1.20-openshift-4.14
4.15 rhel-8-golang-1.20-openshift-4.15
4.16 rhel-9-golang-1.21-openshift-4.16
4.17 rhel-9-golang-1.22-openshift-4.17
4.18 rhel-9-golang-1.22-openshift-4.18
4.19 rhel-9-golang-1.23-openshift-4.19
4.20 rhel-9-golang-1.24-openshift-4.20
4.21 rhel-9-golang-1.24-openshift-4.21
4.22 rhel-9-golang-1.24-openshift-4.22

All should use name: builder and namespace: ocp.

Copy link
Contributor

@rioliu-rh rioliu-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. A few issues need to be addressed before merging — please see the inline comments for details. Summary:

  1. build_root should be release-specific using name: builder / namespace: ocp with the correct per-release golang tag (see inline comment on 4.12 config and the table in the earlier review comment)
  2. The workflow should be moved under the cucushift/ namespace to follow the convention used by all other openshift-tests-private workflows
  3. The env section in the workflow should be simplified to only TEST_SCENARIOS and TEST_TIMEOUTTEST_IMPORTANCE: all is a bug, and TEST_FILTERS/FILTERS_ADDITIONAL are unnecessary
  4. The pre/post chains should use cucushift-installer-rehearse-aws-ipi-provision and cucushift-installer-rehearse-aws-ipi-deprovision instead of the generic ipi-aws-pre/ipi-aws-post, which are missing QE-specific steps and cause enable-qe-catalogsource to run twice

@rioliu-rh
Copy link
Contributor

/cc @liangxia @jianlinliu

@openshift-ci openshift-ci bot requested review from jianlinliu and liangxia March 13, 2026 07:31
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 18, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tomasdavidorg
Once this PR has been reviewed and has the lgtm label, please assign jianlinliu for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tomasdavidorg
Copy link
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.21-stage-testing-e2e-aws-ipi

@openshift-ci-robot
Copy link
Contributor

@tomasdavidorg: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@rioliu-rh
Copy link
Contributor

The rehearsal job is failing with:

level=fatal msg=failed to fetch Common Manifests: failed to fetch dependency of "Common Manifests": failed to generate asset "DNS Config": getting public zone for "origin-ci-int-aws.dev.rhcloud.com": no public route53 zone found matching name "origin-ci-int-aws.dev.rhcloud.com"

Root cause: BASE_DOMAIN is missing from the env section of all 11 stage-testing configs. Without it, the installer falls back to the default domain origin-ci-int-aws.dev.rhcloud.com, which has no Route53 public zone in the QE/sustaining AWS accounts.

Fix: Add BASE_DOMAIN to each config's steps.env, matching the value used by the automated-release jobs for the same cluster profile:

Versions Cluster Profile BASE_DOMAIN
4.12–4.16 aws-sustaining-autorelease-412 sustaining-aws-412.devcluster.openshift.com
4.17–4.22 aws-autorelease-qe qe.devcluster.openshift.com

For example, 4.12–4.16 configs:

tests:
- as: e2e-aws-ipi
  cron: 1 * 31 2 *
  steps:
    cluster_profile: aws-sustaining-autorelease-412
    env:
      BASE_DOMAIN: sustaining-aws-412.devcluster.openshift.com
    workflow: cucushift-installer-rehearse-aws-ipi-stage-testing

And 4.17–4.22 configs:

tests:
- as: e2e-aws-ipi
  cron: 1 * 31 2 *
  steps:
    cluster_profile: aws-autorelease-qe
    env:
      BASE_DOMAIN: qe.devcluster.openshift.com
    workflow: cucushift-installer-rehearse-aws-ipi-stage-testing

Please update all 11 configs and run make update to regenerate the job files.

@tomasdavidorg
Copy link
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.21-stage-testing-e2e-aws-ipi

@openshift-ci-robot
Copy link
Contributor

@tomasdavidorg: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@rioliu-rh
Copy link
Contributor

Issue: Missing Stage CatalogSource Setup

The workflow cucushift-installer-rehearse-aws-ipi-stage-testing is missing the stage catalogsource setup step in the pre phase. This is required to configure the cluster to pull operator content from the stage registry before tests run — equivalent to what the Jenkins/Flexy job does via use_stage_index_catalogsource.sh.

Note: The existing enable-stage-catalogsource step in the registry should NOT be used here — it is outdated, uses deprecated ImageContentSourcePolicy unconditionally, and pulls from registry.stage.redhat.io/redhat/redhat-operator-index (requiring stage registry credentials). It does not match the current Flexy approach.


Proposed fix: create a new step enable-stage-fbc-catalogsource

Credentials needed (ref.yaml):

  • deploy-konflux-operator-art-image-share in test-credentials namespace (same secret already used by enable-qe-catalogsource and deploy-konflux-operator steps) — provides quay.io/openshift-art pull auth

Commands script logic:

  1. Update cluster pull-secret — merge quay.io/openshift-art credentials from the mounted secret into the cluster global pull-secret (openshift-config/pull-secret). Wait for MCP rollout to complete.

  2. Create mirror policy — version-gated by OCP version:

    • OCP 4.12 (kube_minor ≤ 26): Create ImageContentSourcePolicy mirroring registry.redhat.ioregistry.stage.redhat.io
    • OCP 4.13+ (kube_minor ≥ 27): Create ImageDigestMirrorSet instead (IDMS is the modern replacement for deprecated ICSP)
  3. Check marketplace namespace — create openshift-marketplace namespace if absent (OLM has been optional since 4.11)

  4. Create CatalogSource using quay.io/openshift-art/stage-fbc-fragments:ocp-<X.Y> — version-gated:

    • OCP < 4.15 (kube_minor ≤ 27): Plain grpc CatalogSource (no extractContent)
    • OCP 4.15+ (kube_minor > 27): FBC format with grpcPodConfig.extractContent (cacheDir + catalogDir)
  5. Wait for CatalogSource READY — poll with timeout and emit debug output on failure


Updated workflow

pre:
- chain: cucushift-installer-rehearse-aws-ipi-provision
- ref: enable-stage-fbc-catalogsource    # add this
test:
- ref: openshift-extended-test
- ref: openshift-e2e-test-qe-report
post:
- chain: cucushift-installer-rehearse-aws-ipi-deprovision

@asood-rh
Copy link
Contributor

@tomasdavidorg It is great we are migrating the job from jenkins to prow. Have we thought of extending it to chat bot so that we can run this job at will?

@tomasdavidorg
Copy link
Contributor Author

Hi @asood-rh, it will be part of the job tool https://github.com/openshift/release-tests/tree/main/prow, so you can run the job something like job run_stage_testing --payload quay.io/openshift-release-dev/ocp-release:4.y.z-x86_64.
Also OAR tool an ert-release-bot (Slack) can execute the job by oar -r 4.y.z stage-testing.

Do you mean any other specific chat bot?

@rioliu-rh
Copy link
Contributor

Issue: Conflicting CatalogSource Name in Stage Testing Workflow

The new enable-stage-fbc-catalogsource step needs to be added to the pre phase after cucushift-installer-rehearse-aws-ipi-provision, but there is a name conflict that must be handled first.

The cucushift-installer-rehearse-aws-ipi-provision chain already runs enable-qe-catalogsource as its last step, which creates a CatalogSource named qe-app-registry in openshift-marketplace. Since the stage-testing automation code hard-codes the catalogsource name as qe-app-registry, the new stage FBC catalogsource must use the same name.

Fix: The enable-stage-fbc-catalogsource step script must delete the existing qe-app-registry catalogsource before creating the stage one:

# Delete the QE catalogsource created by enable-qe-catalogsource
oc delete catalogsource qe-app-registry -n openshift-marketplace --ignore-not-found=true
oc wait --for=delete catalogsource/qe-app-registry -n openshift-marketplace --timeout=120s || true

# Then create the stage FBC catalogsource with the same name
oc create -f - <<EOF
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: qe-app-registry
  namespace: openshift-marketplace
...
EOF

This is safe because the deletion happens in pre phase before any tests run, so no Subscription will have been created referencing qe-app-registry yet.

@rioliu-rh
Copy link
Contributor

@asood-rh The cluster-bot is not a good fit here. It is designed to provision ephemeral clusters on demand for developer testing/debugging, not to run full e2e workflows against a specific release payload.

Stage-testing is an end-to-end workflow — it provisions a cluster, configures the stage catalog, runs the Stagerun test suite, and tears everything down. The right tools to trigger it on demand are:

  • OAR CLI: oar -r 4.y.z stage-testing
  • ERT Slack bot: trigger directly from Slack without needing CLI access

Both support specifying the exact release version to test against, which is the key requirement for stage-testing.

@rioliu-rh
Copy link
Contributor

@tomasdavidorg presumbit job step-registry-shellcheck is failed due to script issue, I think it was fixed in #76533. can you rebase the code and retry

@liangxia
Copy link
Member

/test step-registry-shellcheck

@tomasdavidorg
Copy link
Contributor Author

Test passed but rather rebasing. Will take a look on the "Missing Stage CatalogSource Setup".

@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@tomasdavidorg: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
periodic-ci-openshift-openshift-tests-private-release-4.12-stage-testing-e2e-aws-ipi N/A periodic Periodic changed
periodic-ci-openshift-openshift-tests-private-release-4.22-stage-testing-e2e-aws-ipi N/A periodic Periodic changed
periodic-ci-openshift-openshift-tests-private-release-4.19-stage-testing-e2e-aws-ipi N/A periodic Periodic changed
periodic-ci-openshift-openshift-tests-private-release-4.16-stage-testing-e2e-aws-ipi N/A periodic Periodic changed
periodic-ci-openshift-openshift-tests-private-release-4.15-stage-testing-e2e-aws-ipi N/A periodic Periodic changed
periodic-ci-openshift-openshift-tests-private-release-4.13-stage-testing-e2e-aws-ipi N/A periodic Periodic changed
periodic-ci-openshift-openshift-tests-private-release-4.20-stage-testing-e2e-aws-ipi N/A periodic Periodic changed
periodic-ci-openshift-openshift-tests-private-release-4.18-stage-testing-e2e-aws-ipi N/A periodic Periodic changed
periodic-ci-openshift-openshift-tests-private-release-4.21-stage-testing-e2e-aws-ipi N/A periodic Periodic changed
periodic-ci-openshift-openshift-tests-private-release-4.14-stage-testing-e2e-aws-ipi N/A periodic Periodic changed
periodic-ci-openshift-openshift-tests-private-release-4.17-stage-testing-e2e-aws-ipi N/A periodic Periodic changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 20, 2026

@tomasdavidorg: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants